<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
  <head>
    <meta charset="utf-8" />
    <meta name="generator" content="pandoc" />
    <meta
      name="viewport"
      content="width=device-width, initial-scale=1.0, user-scalable=yes"
    />
    <title>(Tutorial)_Reading_and_Writing_Files_in_Python</title>
    <style type="text/css">
      code {
        white-space: pre-wrap;
      }
      span.smallcaps {
        font-variant: small-caps;
      }
      span.underline {
        text-decoration: underline;
      }
      div.column {
        display: inline-block;
        vertical-align: top;
        width: 50%;
      }
    </style>
  </head>
  <body>
    <h1 id="tutorial-reading-and-writing-files-in-python">
      (Tutorial) Reading and Writing Files in Python
    </h1>
    <blockquote>
      <p>
        Learn how to read and write data into flat files, such as CSV, JSON,
        text files, and binary files in Python using io and os modules.
      </p>
    </blockquote>
    <p>
      <a href="https://www.datacamp.com/learn/python/"
        ><img
          src="https://res.cloudinary.com/dyd911kmh/image/upload/f_auto,q_auto:best/v1583330891/python_3_qgciu1.png"
      /></a>
    </p>
    <p>
      As a data scientist, you handle a lot of data daily. And this data could
      be from multiple sources like from databases, from Excel to flat files,
      from public websites like kaggle. Not just sources it could be in any file
      format like <code>.csv</code>, <code>.txt</code>, <code>.parquet</code>,
      etc. Before you start making sense of the data, you will need to know the
      basic three things: how to open, read and write data into flat files so
      that you can then perform analyses on them.
    </p>
    <p>You would also learn about the following topics in this tutorial:</p>
    <ul>
      <li>Python file object</li>
      <li>
        How to <code>open</code> a basic flat file like <code>.csv</code>,
        <code>json</code>, etc. and <code>read</code> data from a file
      </li>
      <li>Write data to a file</li>
      <li>You’ll also see some Python file object attributes</li>
      <li>You would also dig into the Python os module</li>
      <li>
        You would also learn about the <code>NumPy</code> library and how it can
        be used to import <code>Image</code> datasets
      </li>
    </ul>
    <p>
      First, let’s understand the difference between flat files and non-flat
      files.
    </p>
    <h2 id="flat-files-vs.-non-flat-files">Flat Files vs. Non-Flat Files</h2>
    <p>
      Flat files are data files that contain records with no structured
      relationships between the records, and there’s also no structure for
      indexing like you typically find it in relational databases. These files
      can contain only basic formatting, have a small fixed number of fields,
      and can or can not have a file format.
    </p>
    <p>
      <img
        src="https://res.cloudinary.com/dyd911kmh/image/upload/f_auto,q_auto:best/v1590365454/read6_ovzuw3.png"
      />
    </p>
    <p>
      <a
        href="https://upload.wikimedia.org/wikipedia/commons/d/dd/Flat_File_Model.svg"
        >Source</a
      >
    </p>
    <p>
      Though in both flat and non-flat files, the data is usually in a tabular
      row-column fashion.
    </p>
    <p>
      A non-flat file is a file where an index is assigned to every record. The
      exact location of the record can be known using the index of that record.
      You would normally need some applications like a database management
      system to read this type of file.
    </p>
    <p><code>XML</code> is an example of a <code>non-flat</code> file.</p>
    <p>
      A flat file can be a plain text file having a <code>TSV</code>,
      <code>CSV</code> format, or a binary file format. In the former case, the
      files usually contain one record per line:
    </p>
    <ul>
      <li>
        <p>
          Comma Separated Values (CSV) files, which contain data values that are
          separated by <code>,</code> for example:
        </p>
        <pre><code>NAME,ADDRESS,EMAIL
ABC,CITY A,abc@xyz.com
LMN,CITY B,lmn@xyz.com
PQR,CITY C,pqr@xyz.com</code></pre>
      </li>
      <li>
        <p>
          Delimited files, which contain data values with a user-specified
          delimiter. This can be a <code>\t</code> tab or a symbol
          (<code>#</code>,<code>&amp;</code>,<code>||</code>), for example:
        </p>
        <pre><code>NAME||ADDRESS||EMAIL
ABC||CITY A||abc@xyz.com
LMN||CITY B||lmn@xyz.com
PQR||CITY C||pqr@xyz.com</code></pre>
        <p>
          Let’s now understand how Python creates and reads these types of file
          formats having specific delimiters.
        </p>
      </li>
    </ul>
    <h2 id="python-file-objects">Python File Objects</h2>
    <p>
      Python has in-built functions to create, read, write, and manipulate
      accessible files. The <code>io</code> module is the default module for
      accessing files that can be used off the shelf without even importing it.
      Before you read, write, or manipulate the file, you need to make use of
      the module <code>open(filename, access_mode)</code> that returns a file
      object called “handle”. After which you can simply use this handle to read
      from or write to a file. Like everything else, files in Python are also
      treated as an object, which has its own attributes and methods.
    </p>
    <p>
      An IOError exception is raised if, while opening the file, the operation
      fails. It could be due to various reasons like trying to read a file that
      is opened in write mode or accessing a file that is already closed.
    </p>
    <p>
      As you already read before, there are two types of flat files, text and
      binary files:
    </p>
    <ul>
      <li>
        As you might have expected from reading the previous section, text files
        have an End-Of-Line (EOL) character to indicate each line’s termination.
        In Python, the new line character (<code>\n</code>) is the default EOL
        terminator.
      </li>
      <li>
        Since binary files store data after converting it into a binary language
        (0s and 1s), there is no EOL character. This file type returns bytes.
        This file is used when dealing with non-text files such as images,
        <code>.exe</code>, or <code>.pyc</code>.
      </li>
    </ul>
    <p>
      Let’s now understand the Python file objects in detail, along with
      necessary examples.
    </p>
    <h3 id="open">Open()</h3>
    <p>
      The built-in Python function <code>open()</code> has the following
      arguments: open(file, mode=‘r’, buffering=-1, encoding=None, errors=None,
      newline=None, closefd=True, opener=None) The <code>open()</code> function
      has almost 8 parameters along with their default values for each argument
      as shown above.
    </p>
    <p>
      You would be focusing on the first and second parameters for now, which
      are essential for reading and writing files. And go through other
      parameters one by one as the tutorial progresses.
    </p>
    <p>Let’s understand the first argument, i.e., <code>file</code>.</p>
    <h3 id="file">file</h3>
    <p>
      <code>file</code> is a mandatory argument that you have to provide to the
      <code>open</code> function while rest all arguments are optional and use
      their default values.
    </p>
    <p>
      To put it simply, the <code>file</code> argument represents the path where
      your file resides in your system.
    </p>
    <p>
      If the path is in the current working directory, you can just provide the
      filename. If not then you have to provide the absolute path of the file,
      just like in the following examples:
      my_file_handle=open(“mynewtextfile.txt”) If the file resides in a
      directory other than the current directory, you have to provide the
      absolute path with the file name:
    </p>
    <pre><code>my_file_handle=open(&quot;D://test.txt&quot;)
my_file_handle.read()


&quot;Welcome to DataCamp&#39;s Tutorial on Reading and Writing Files in Python!&quot;</code></pre>
    <p>
      Make sure file name and path given is correct, otherwise you’ll get a
      <code>FileNotFoundError</code>:
    </p>
    <pre><code>my_file_handle=open(&quot;folder/test.txt&quot;)
my_file_handle.read()


---------------------------------------------------------------------------

FileNotFoundError                         Traceback (most recent call last)

&lt;ipython-input-2-a0d1ea891918&gt; in &lt;module&gt;
----&gt; 1 my_file_handle=open(&quot;folder/test.txt&quot;)
      2 my_file_handle.read()


FileNotFoundError: [Errno 2] No such file or directory: &#39;folder/test.txt&#39;</code></pre>
    <h4 id="exception-handling-in-files">Exception Handling in files</h4>
    <p>You can catch the exception with a try-finally block:</p>
    <pre><code>try:
    my_file_handle=open(&quot;folder/test.txt&quot;)
except IOError:
    print(&quot;File not found or path is incorrect&quot;)
finally:
    print(&quot;exit&quot;)


File not found or path is incorrect
exit</code></pre>
    <p>
      Let’s understand the second argument of the <code>open</code> function,
      i.e., <code>access modes</code>.
    </p>
    <h3 id="access-modes">Access Modes</h3>
    <p>
      Access modes define in which way you want to open a file, whether you want
      to open a file in:
    </p>
    <ul>
      <li>read-only mode</li>
      <li>write-only mode</li>
      <li>append mode</li>
      <li>both read and write mode</li>
    </ul>
    <p>
      Though a lot of access modes exist as shown in the below table, the most
      commonly used ones are read and write modes. It specifies where you want
      to start reading or writing in the file.
    </p>
    <p>
      You use <code>'r'</code>, the default mode, to read the file. In other
      cases where you want to write or append, you use <code>'w'</code> or
      <code>'a'</code>, respectively.
    </p>
    <p>
      There are, of course, more access modes! Take a look at the following
      table:
    </p>
    <p>
      <img
        src="https://res.cloudinary.com/dyd911kmh/image/upload/f_auto,q_auto:best/v1590365454/read2_tdryms.png"
      />
    </p>
    <p>
      As you have seen in the first section, there are two types of flat files.
      This is also why there’s an option to specify which format you want to
      open, such as text or binary. Of course, the former is the default. When
      you add <code>'b'</code> to the access modes, you can read the file in
      binary format rather than the default text format. It is used when the
      file to be accessed is not in text format.
    </p>
    <h4 id="reading-from-a-file">Reading from a file</h4>
    <p>
      Let’s try out all the reading methods for reading from a file, and you
      will also explore the access modes along with it! There are three ways to
      read from a file.
    </p>
    <ul>
      <li><code>read([n])</code></li>
      <li><code>readline([n])</code></li>
      <li><code>readlines()</code></li>
    </ul>
    <p>
      Here <code>n</code> is the number of bytes to be read. If nothing is
      passed to <code>n</code>, then the complete file is considered to be read.
    </p>
    <p>
      Create a file as below: 1st line 2nd line 3rd line 4th line 5th line Let’s
      understand what each read method does:
    </p>
    <pre><code>my_file=open(&quot;test1.txt&quot;,&quot;r&quot;)
print(my_file.read())


1st line
2nd line
3rd line
4th line
5th line</code></pre>
    <p>
      The <code>read()</code> method just outputs the entire file if the number
      of bytes (<code>n</code>) is not given in the argument. If you execute
      <code>my_file.read(3)</code>, you will get back the first three characters
      of the file, as shown below:
    </p>
    <pre><code>my_file=open(&quot;test1.txt&quot;,&quot;r&quot;)
print(my_file.read(3))


1st</code></pre>
    <p>
      <code>readline(n)</code> outputs at most <code>n</code> bytes of a single
      line of a file. It does not read more than one line.
    </p>
    <pre><code>my_file.close()
my_file=open(&quot;test1.txt&quot;,&quot;r&quot;)

print(my_file.readline())

print(my_file.readline(2))


1st line

2n</code></pre>
    <h3 id="closing-python-files-with-close">
      Closing Python Files with close()
    </h3>
    <p>
      Use the <code>close()</code> method with file handle to close the file.
      When you use this method, you clear all buffer and close the file.
    </p>
    <pre><code>my_file.close()</code></pre>
    <p>You can use a <code>for</code> loop to read the file line by line:</p>
    <pre><code>my_file=open(&quot;test1.txt&quot;,&quot;r&quot;)

for line in my_file:
    print(line)
my_file.close()


1st line

2nd line

3rd line

4th line

5th line</code></pre>
    <p>
      The <code>readlines()</code> method maintains a list of each line in the
      file which can be iterated using a for loop:
    </p>
    <pre><code>my_file=open(&quot;test1.txt&quot;,&quot;r&quot;)
my_file.readlines()


[&#39;1st line\n&#39;, &#39;2nd line\n&#39;, &#39;3rd line\n&#39;, &#39;4th line\n&#39;, &#39;5th line&#39;]</code></pre>
    <h3 id="writing-to-a-file">Writing to a file</h3>
    <p>You can use three methods to write to a file in Python:</p>
    <ul>
      <li>
        <code>write(string)</code> (for text) or
        <code>write(byte_string)</code> (for binary)
      </li>
      <li><code>writelines(list)</code></li>
    </ul>
    <p>
      Let’s create a new file. The following will create a new file in the
      specified folder because it does not exist. <code>Remember</code> to give
      correct path with correct filename; otherwise, you will get an error:
    </p>
    <p>
      Create a notepad file and write some text in it. Make sure to save the
      file as <code>.txt</code> and save it to the working directory of Python.
    </p>
    <pre><code>new_file=open(&quot;newfile.txt&quot;,mode=&quot;w&quot;,encoding=&quot;utf-8&quot;)


new_file.write(&quot;Writing to a new file\n&quot;)
new_file.write(&quot;Writing to a new file\n&quot;)
new_file.write(&quot;Writing to a new file\n&quot;)
new_file.close()</code></pre>
    <h4 id="append-mode">Append Mode</h4>
    <p>Now let’s write a list to this file with <code>a+</code> mode:</p>
    <pre><code>fruits=[&quot;Orange\n&quot;,&quot;Banana\n&quot;,&quot;Apple\n&quot;]
new_file=open(&quot;newfile.txt&quot;,mode=&quot;a+&quot;,encoding=&quot;utf-8&quot;)
new_file.writelines(fruits)
for line in new_file:
    print(line)
new_file.close()</code></pre>
    <h4 id="seek-method">Seek Method</h4>
    <p>
      Note that reading from a file does not print anything because the file
      cursor is at the end of the file. To set the cursor at the beginning, you
      can use the <code>seek()</code> method of file object:
    </p>
    <pre><code>cars=[&quot;Audi\n&quot;,&quot;Bentley\n&quot;,&quot;Toyota\n&quot;]
new_file=open(&quot;newfile.txt&quot;,mode=&quot;a+&quot;,encoding=&quot;utf-8&quot;)
for car in cars:
    new_file.write(car)
print(&quot;Tell the byte at which the file cursor is:&quot;,new_file.tell())
new_file.seek(0)
for line in new_file:
    print(line)


Tell the byte at which the file cursor is: 115
Writing to a new file

Writing to a new file

Writing to a new file

Orange

Banana

Apple

Audi

Bentley

Toyota</code></pre>
    <p>
      The <code>tell()</code> method of a file object tells at which byte the
      file cursor is located. In <code>seek(offset,reference_point)</code>, the
      reference points are <code>0</code> (the beginning of the file and is the
      default), <code>1</code> (the current position of file), and
      <code>2</code> (the end of the file).
    </p>
    <p>
      Let’s try out passing another reference point and offset and see the
      output:
    </p>
    <pre><code>new_file.seek(4,0)
print(new_file.readline())
new_file.close()


ing to a new file</code></pre>
    <h4 id="next-method">next Method</h4>
    <p>
      You are only left with the <code>next()</code> method, so let’s complete
      this section of the tutorial! Here you are using the same file created
      above with the name <code>test1.txt</code>.
    </p>
    <p>
      End-relative seeks such as <code>seek(-2,2)</code> are not allowed if file
      mode does not include <code>'b'</code>, which indicates binary format.
      Only forward operations such as <code>seek(0,2)</code> are allowed when
      the file object is dealt with as a text file.
    </p>
    <pre><code>file=open(&quot;test1.txt&quot;,&quot;r&quot;)
for index in range(5):
    line=next(file)
    print(line)
file.close()


1st line

2nd line

3rd line

4th line

5th line</code></pre>
    <p>
      <strong>Note</strong>: <code>write()</code> method doesn’t write data to a
      file, but to a buffer, it does, but only when the
      <code>close()</code> method is called. This latter method flushes the
      buffer and writes the content to the file. If you wish not to close the
      file using <code>fileObject.flush()</code> method to clear the buffer and
      write back to file.
    </p>
    <h3 id="importing-the-moby-dick-novel">Importing the Moby Dick Novel</h3>
    <p>
      Moby Dick is an 1851 novel by American writer Herman Melville. You’ll be
      working with the file moby_dick.txt. It is a text file that contains the
      opening sentences of Moby Dick, one of the great American novels! Here
      you’ll get experience opening a text file, printing its contents, and,
      finally, closing it.
    </p>
    <p>
      You can download the moby dick text file from
      <a
        href="https://github.com/wblakecannon/DataCamp/blob/master/05-importing-data-in-python-(part-1)/_datasets/moby_dick.txt"
        >here</a
      >.
    </p>
    <p>You will do the following things:</p>
    <ul>
      <li>
        Open the moby_dick.txt file in read-only mode and store it in the
        variable file
      </li>
      <li>Print the contents of the file</li>
      <li>Check whether the file is closed</li>
      <li>Close the file using the close() method</li>
      <li>
        <p>Check again whether the file is closed</p>
        <p>file = open(‘moby_dick.txt’, ‘r’)</p>
        <p>print(file.read()) print(‘’)</p>
        <p>print(‘Is the file closed?:’, file.closed)</p>
        <p>file.close() print(‘’)</p>
        <p>print(‘Is the file closed?:’, file.closed)</p>
        <p>CHAPTER 1. Loomings.</p>
        <p>
          Call me Ishmael. Some years ago–never mind how long precisely–having
          little or no money in my purse, and nothing particular to interest me
          on shore, I thought I would sail about a little and see the watery
          part of the world. It is a way I have of driving off the spleen and
          regulating the circulation. Whenever I find myself growing grim about
          the mouth; whenever it is a damp, drizzly November in my soul;
          whenever I find myself involuntarily pausing before coffin warehouses,
          and bringing up the rear of every funeral I meet; and especially
          whenever my hypos get such an upper hand of me, that it requires a
          strong moral principle to prevent me from deliberately stepping into
          the street, and methodically knocking people’s hats off–then, I
          account it high time to get to sea as soon as I can. This is my
          substitute for pistol and ball. With a philosophical flourish Cato
          throws himself upon his sword; I quietly take to the ship. There is
          nothing surprising in this. If they but knew it, almost all men in
          their degree, some time or other, cherish very nearly the same
          feelings towards the ocean with me.
        </p>
        <p>Is the file closed?: False</p>
        <p>Is the file closed?: True</p>
      </li>
    </ul>
    <h4 id="reading-the-moby-dick-novel-using-context-manager">
      Reading the Moby Dick Novel using Context Manager
    </h4>
    <p>
      You can bind a file object by using a context manager construct, and you
      don’t need to worry about closing the file. The file can not be accessed
      outside the context manager and is deemed closed.
    </p>
    <p>
      Let’s print the first three lines of the moby dick text file using the
      <code>readline()</code> method. Note that the file is opened by default in
      a <code>read</code> mode.
    </p>
    <pre><code>with open(&#39;moby_dick.txt&#39;) as file:
    print(file.readline())
    print(file.readline())
    print(file.readline())


CHAPTER 1. Loomings.



Call me Ishmael. Some years ago--never mind how long precisely--having</code></pre>
    <h4 id="writing-to-a-json-file">Writing to a JSON File</h4>
    <p>You can also write your data to <code>.json</code> files.</p>
    <p>
      Remember: Javascript Object Notation (JSON) has become a popular method
      for the exchange of structured information over a network and sharing
      information across platforms. It is basically text with some structure and
      saving it as <code>.json</code> tells how to read the structure;
      otherwise, it is just a plain text file. It stores data as key: value
      pairs. The structure can be simple to complex.
    </p>
    <p>
      Take a look at the following simple JSON for countries and their capitals:
    </p>
    <pre><code>{
&quot;Algeria&quot;:&quot;Algiers&quot;,
&quot;Andorra&quot;:&quot;Andorra la Vella&quot;,
&quot;Nepal&quot;:&quot;Kathmandu&quot;,
&quot;Netherlands&quot;:&quot;Amsterdam&quot;,
}</code></pre>
    <p>
      Since JSON consists of an array of <code>key: value</code> pairs as shown
      in below code cell, anything before <code>:</code> is called key and after
      <code>:</code> is called value. This is very similar to Python
      dictionaries, isn’t it! You can see that the data is separated by
      <code>,</code> and that curly braces define objects. Square brackets are
      used to define arrays in more complex JSON files, as you can see in the
      following excerpt:
    </p>
    <pre><code>{
  &quot;colors&quot;: [
    {
      &quot;color&quot;: &quot;black&quot;,
      &quot;category&quot;: &quot;hue&quot;,
      &quot;type&quot;: &quot;primary&quot;,
      &quot;code&quot;: {
        &quot;rgba&quot;: [255,255,255,1],
        &quot;hex&quot;: &quot;#000&quot;
      }
    },
    {
      &quot;color&quot;: &quot;white&quot;,
      &quot;category&quot;: &quot;value&quot;,
      &quot;code&quot;: {
        &quot;rgba&quot;: [0,0,0,1],
        &quot;hex&quot;: &quot;#FFF&quot;
      }
    },
    {
      &quot;color&quot;: &quot;red&quot;,
      &quot;category&quot;: &quot;hue&quot;,
      &quot;type&quot;: &quot;primary&quot;,
      &quot;code&quot;: {
        &quot;rgba&quot;: [255,0,0,1],
        &quot;hex&quot;: &quot;#FF0&quot;
      }
    },
    {
      &quot;color&quot;: &quot;blue&quot;,
      &quot;category&quot;: &quot;hue&quot;,
      &quot;type&quot;: &quot;primary&quot;,
      &quot;code&quot;: {
        &quot;rgba&quot;: [0,0,255,1],
        &quot;hex&quot;: &quot;#00F&quot;
      }
    },
    {
      &quot;color&quot;: &quot;yellow&quot;,
      &quot;category&quot;: &quot;hue&quot;,
      &quot;type&quot;: &quot;primary&quot;,
      &quot;code&quot;: {
        &quot;rgba&quot;: [255,255,0,1],
        &quot;hex&quot;: &quot;#FF0&quot;
      }
    },
    {
      &quot;color&quot;: &quot;green&quot;,
      &quot;category&quot;: &quot;hue&quot;,
      &quot;type&quot;: &quot;secondary&quot;,
      &quot;code&quot;: {
        &quot;rgba&quot;: [0,255,0,1],
        &quot;hex&quot;: &quot;#0F0&quot;
      }
    },
  ]
}</code></pre>
    <p>
      Note that JSON files can hold different data types in one object as well!
    </p>
    <p>
      When you read the file with <code>read()</code>, you read strings from a
      file. That means that when you read numbers, you would need to convert
      them to integers with data type conversion functions like
      <code>int()</code>. For more complex use cases, you can always use the
      <code>JSON</code> module.
    </p>
    <p>
      If you have an object <code>x</code>, you can view its JSON string
      representation with a simple line of code:
    </p>
    <pre><code>import json
my_data=[&quot;Reading and writing files in python&quot;,78546]
json.dumps(my_data)


&#39;[&quot;Reading and writing files in python&quot;, 78546]&#39;</code></pre>
    <p>
      To write the JSON in a file, you can use the <code>.dump()</code> method:
    </p>
    <pre><code>with open(&quot;jsonfile.json&quot;,&quot;w&quot;) as f:
    json.dump(my_data,f)
f.close()</code></pre>
    <p>
      Note: It is good practice to use the with-open method to open a file
      because it closes the file properly if any exception is raised on the way.
    </p>
    <p>
      Let’s now open the <code>JSON</code> file you created using the
      <code>dump</code> method. If a <code>JSON</code> file is opened for
      reading, you can decode it with <code>load(file)</code> as follows:
    </p>
    <pre><code>with open(&quot;jsonfile.json&quot;,&quot;r&quot;) as f:
    jsondata=json.load(f)
    print(jsondata)
f.close()


[&#39;Reading and writing files in python&#39;, 78546]</code></pre>
    <p>
      Similarly, more complex dictionaries can be stored using the
      <code>JSON</code> module. You can find more information
      <a href="https://docs.python.org/2/library/json.html#module-json">here</a
      >.
    </p>
    <p>
      Now, you will see some other parameters of the <code>open()</code> method,
      which you have already seen in the previous sections. Let’s start with
      <code>buffering</code>.
    </p>
    <h3 id="buffering">Buffering</h3>
    <p>
      A buffer holds a chunk of data from the operating system’s file stream
      until it is used upon which more data comes in, which is similar to video
      buffering.
    </p>
    <p>
      Buffering is useful when you don’t know the size of the file you are
      working with if the file size is greater than computer memory then the
      processing unit will not function properly. The buffer size tells how much
      data can be held at a time until it is used.
      <code>io.DEFAULT_BUFFER_SIZE</code> can tell the default buffer size of
      your platform.
    </p>
    <p>
      Optionally, you can pass an integer to <code>buffering</code> to set the
      buffering policy:
    </p>
    <ul>
      <li>
        <code>0</code> to switch off buffering (only allowed in binary mode)
      </li>
      <li>
        <code>1</code> to select line buffering (only usable in text mode)
      </li>
      <li>
        Any integer that is bigger than <code>1</code> to indicate the size in
        bytes of a fixed-size chunk buffer
      </li>
      <li>
        Use negative values to set the buffering policy to the system default
      </li>
    </ul>
    <p>When you don’t specify any policy, the default is:</p>
    <ul>
      <li>Binary files are buffered in fixed-size chunks</li>
      <li>
        The size of the buffer is chosen depending on the underlying device’s
        “block size”. On many systems, the buffer will typically be 4096 or 8192
        bytes long.
      </li>
      <li>
        <p>
          “Interactive” text files (files for which
          <code>isatty()</code> returns <code>True</code>) use line buffering.
          Other text files use the policy described above for binary files. Note
          that <code>isatty()</code> can be used to see if you’re connected to a
          Tele-TYpewriter(-like) device.
        </p>
        <p>
          import io print(“Default buffer size:”,io.DEFAULT_BUFFER_SIZE)
          file=open(“test1.txt”,mode=“r”,buffering=5) print(file.line_buffering)
          file_contents=file.buffer for line in file_contents: print(line)
        </p>
        <p>
          Default buffer size: 8192 False b’1st line’ b’2nd line’ b’3rd line’
          b’4th line’ b’5th line’
        </p>
      </li>
    </ul>
    <p>
      <strong>Note</strong> that if you are using all arguments in the order
      that is specified in
      <code
        >open(file, mode='r', buffering=-1, encoding=None, errors=None,
        newline=None, closefd=True, opener=None)</code
      >, you don’t need to write argument name! If you skip arguments because
      you want to keep the default values, it’s better to write everything out
      in full.
    </p>
    <h3 id="errors">Errors</h3>
    <p>
      An optional string that specifies how encoding and decoding errors are to
      be handled. This argument cannot be used in binary mode. A variety of
      standard error handlers are available (listed under Error Handlers).
    </p>
    <pre><code>file=open(&quot;test1.txt&quot;,mode=&quot;r&quot;,errors=&quot;strict&quot;)
print(file.read())
file.close()


1st line
2nd line
3rd line
4th line
5th line</code></pre>
    <p>
      <code>errors="strict"</code> raises <code>ValueErrorException</code> if
      there is encoding error.
    </p>
    <h3 id="newline">Newline</h3>
    <p>
      <code>newline</code> controls how universal newlines mode works (it only
      applies to text mode). It can be None, ’‘,’\n’, ‘\r’, and ‘\r\n’. In the
      example above, you see that passing <code>None</code> to
      <code>newline</code> translates <code>'\r\n'</code> to <code>'\n'</code>.
    </p>
    <ul>
      <li>
        <strong>None</strong>:universal newlines mode is enabled. Lines in the
        input can end in ‘\n’, ‘\r’, or ‘\r\n’, and these are translated into
        default line separator
      </li>
      <li>
        <strong>" "</strong>:universal newlines mode is enabled, but line
        endings are returned not translated
      </li>
      <li>
        <strong>‘\n’,‘\r’, ‘\r\n’</strong>:Input lines are only terminated by
        the given string, and the line ending is not translated.
      </li>
    </ul>
    <p>
      Note that universal newlines are a manner of interpreting text streams in
      which all of the following are recognized as ending a line: the Unix
      end-of-line convention ‘\n’, the Windows convention ‘\r\n’, and the old
      Macintosh convention ‘\r’.
    </p>
    <p>
      Note also that <code>os.linesep</code> returns the system’s default line
      separator:
    </p>
    <pre><code>file=open(&quot;test1.txt&quot;,mode=&quot;r&quot;,newline=&quot;&quot;)
file.read()


&#39;1st line\r\n2nd line\r\n3rd line\r\n4th line\r\n5th line&#39;


file=open(&quot;test1.txt&quot;,mode=&quot;r&quot;,newline=None)
file.read()


&#39;1st line\n2nd line\n3rd line\n4th line\n5th line&#39;


file.close()</code></pre>
    <h3 id="encoding">Encoding</h3>
    <p>
      <code>Encoding</code> represents the character encoding, which is the
      coding system that uses bits and byte to represent a character. This
      concept frequently pops up when you’re talking about data storage, data
      transmission, and computation.
    </p>
    <p>
      As default <code>encoding</code> is operating system dependent for
      Microsoft Windows, it is cp1252 but UTF-8 in Linux. So when dealing with
      text files, it is a good practice to specify the character encoding. Note
      that the binary mode doesn’t take an encoding argument.
    </p>
    <p>
      Earlier, you read that you can use the <code>errors</code> parameter to
      handle encoding and decoding error and that you use
      <code>newline</code> to deal with line endings. Now, try out the following
      code for these:
    </p>
    <pre><code>with open(&quot;test1.txt&quot;,mode=&quot;r&quot;) as file:
    print(&quot;Default encoding:&quot;,file.encoding)
    file.close()

with open(&quot;test1.txt&quot;,mode=&quot;r&quot;,encoding=&quot;utf-8&quot;) as file:
    print(&quot;New encoding:&quot;,file.encoding)
    file.close()


Default encoding: cp1252
New encoding: utf-8</code></pre>
    <h3 id="closefd">closefd</h3>
    <p>
      If <code>closefd</code> is <code>False</code> and a file descriptor,
      rather than a filename was given, the underlying file descriptor will be
      kept open when the file is closed. If a filename is given,
      <code>closefd</code> has to be set to <code>True</code>, which is the
      default. Otherwise, you’ll probably get an error. You use this argument to
      wrap an existing file descriptor into a real file object.
    </p>
    <p>
      Note that a file descriptor is simply an integer assigned to a file object
      by the operating system so that Python can request I/O operations. The
      method <code>.fileno()</code> returns this integer.
    </p>
    <p>
      If you have an integer file descriptor already open for an I/O channel you
      can wrap a file object around it as below:
    </p>
    <pre><code>file=open(&quot;test1.txt&quot;,&quot;r+&quot;)
fd=file.fileno()
print(&quot;File descriptor assigned:&quot;,fd)


filedes_object=open(fd,&quot;w&quot;)
filedes_object.write(&quot;Data sciences\r\nPython&quot;)
filedes_object.close()


File descriptor assigned: 6</code></pre>
    <p>
      To prevent closing the underlying file object, you can use
      <code>closefd=False</code>:
    </p>
    <pre><code>file=open(&quot;test1.txt&quot;,&quot;r+&quot;)
fd=file.fileno()
print(&quot;File descriptor assigned:&quot;,fd)


filedes_object=open(fd,&quot;w&quot;,closefd=False)
filedes_object.write(&quot;Hello&quot;)
filedes_object.close()
file.close()


File descriptor assigned: 6</code></pre>
    <p>
      You have learned a lot about reading text files in Python, but as you have
      read repeatedly throughout this tutorial, these are not the only files
      that you can import: there are also binary files.
    </p>
    <p>But what are these binary files exactly?</p>
    <p>
      Binary files store data in <code>0's</code> and <code>1's</code> that are
      machine-readable. A byte is a collection of 8-bits. One character stores
      one byte in the memory that is 8-bits. For example, the binary
      representation of character ‘H’ is <code>01001000</code> and convert this
      8-bit binary string into decimal gives you <code>72</code>.
    </p>
    <pre><code>binary_file=open(&quot;binary_file.bin&quot;,mode=&quot;wb+&quot;)
text=&quot;Hello 123&quot;
encoded=text.encode(&quot;utf-8&quot;)
binary_file.write(encoded)
binary_file.seek(0)
binary_data=binary_file.read()
print(&quot;binary:&quot;,binary_data)
text=binary_data.decode(&quot;utf-8&quot;)
print(&quot;Decoded data:&quot;,text)


binary: b&#39;Hello 123&#39;
Decoded data: Hello 123</code></pre>
    <p>
      When you open a file for reading in binary mode <code>b</code>, it returns
      bytes of data.
    </p>
    <p>
      If you ever need to read or write text from a binary-mode file, make sure
      you remember to decode or encode it like above. You can access each byte
      through iteration like below, and it will return integer byte values
      (decimal of the 8-bit binary representation of each character) instead of
      byte strings:
    </p>
    <pre><code>for byte in binary_data:
    print(byte)


72
101
108
108
111
32
49
50
51</code></pre>
    <h3 id="python-file-object-attributes">Python File Object Attributes</h3>
    <p>File attributes give information about the file and file state.</p>
    <p>
      <img
        src="https://res.cloudinary.com/dyd911kmh/image/upload/f_auto,q_auto:best/v1590365454/read3_yc0ljw.png"
      />
    </p>
    <pre><code>with open(&quot;test1.txt&quot;) as file:
    print(&quot;Name of the file:&quot;,file.name)
    print(&quot;Mode of the file:&quot;,file.mode)
    print(&quot;Mode of the file:&quot;,file.encoding)
    file.close()
print(&quot;Closed?&quot;,file.closed)


Name of the file: test1.txt
Mode of the file: r
Mode of the file: cp1252
Closed? True</code></pre>
    <h3 id="other-methods-of-file-object">Other Methods of File object</h3>
    <p>
      <img
        src="https://res.cloudinary.com/dyd911kmh/image/upload/f_auto,q_auto:best/v1590365454/read4_r57cm1.png"
      />
    </p>
    <p>Let’s try out all of these methods:</p>
    <pre><code>with open(&quot;mynewtextfile.txt&quot;,&quot;w+&quot;) as f:
    f.write(&quot;We are learning python\nWe are learning python\nWe are learning python&quot;)
    f.seek(0)
    print(f.read())
    print(&quot;Is readable:&quot;,f.readable())
    print(&quot;Is writeable:&quot;,f.writable())
    print(&quot;File no:&quot;,f.fileno())
    print(&quot;Is connected to tty-like device:&quot;,f.isatty())
    f.truncate(5)
    f.flush()
    f.seek(0)
    print(f.read())
f.close()


We are learning python
We are learning python
We are learning python
Is readable: True
Is writeable: True
File no: 8
Is connected to tty-like device: False
We ar</code></pre>
    <h3 id="handling-files-through-the-os-module">
      Handling files through the <code>os</code> module
    </h3>
    <p>
      The <code>os</code> module of Python allows you to perform Operating
      System dependent operations such as making a folder, listing contents of a
      folder, know about a process, end a process, etc. It has methods to view
      environment variables of the Operating System on which Python is working
      on and many more.
      <a href="https://docs.python.org/2/library/os.html">Here</a> is the Python
      documentation for the <code>os</code> module.
    </p>
    <p>
      Let’s see some useful <code>os</code> module methods that can help you to
      handle files and folders in your program.
    </p>
    <p>
      <img
        src="https://res.cloudinary.com/dyd911kmh/image/upload/f_auto,q_auto:best/v1590365454/read5_a1ztyh.png"
      />
    </p>
    <p>Let’s see some examples of these methods:</p>
    <pre><code>import os
os.getcwd()


&#39;C:\\Users\\hda3kor\\Documents\\Reading_and_Writing_Files&#39;


os.makedirs(&quot;my_folder&quot;)


---------------------------------------------------------------------------

FileExistsError                           Traceback (most recent call last)

&lt;ipython-input-12-f469e8a88f1b&gt; in &lt;module&gt;
----&gt; 1 os.makedirs(&quot;my_folder&quot;)


C:\Program Files\Anaconda3\lib\os.py in makedirs(name, mode, exist_ok)
    219             return
    220     try:
--&gt; 221         mkdir(name, mode)
    222     except OSError:
    223         # Cannot rely on checking for EEXIST, since the operating system


FileExistsError: [WinError 183] Cannot create a file when that file already exists: &#39;my_folder&#39;</code></pre>
    <p>The next code chunk will create a folder named my_folder:</p>
    <pre><code>open(&quot;my_folder\\newfile.txt&quot;,&quot;w&quot;)
print(&quot;Contents of folder my_folder\n&quot;,os.listdir(&quot;my_folder&quot;))
print(&quot;---------------------------------&quot;)
print(&quot;Size of folder my_folder (in bytes)&quot;,os.path.getsize(&quot;my_folder&quot;))
print(&quot;Is file?&quot;,os.path.isfile(&quot;test1.txt&quot;))
print(&quot;Is folder?&quot;,os.path.isdir(&quot;my_folder&quot;))
os.chdir(&quot;my_folder&quot;)
os.rename(&quot;newfile.txt&quot;,&quot;hello.txt&quot;)
print(&quot;New Contents of folder my_folder\n&quot;,os.listdir(&quot;my_folder&quot;))


Contents of folder my_folder
 [&#39;hello.txt&#39;, &#39;newfile.txt&#39;]
---------------------------------
Size of folder my_folder (in bytes) 0
Is file? True
Is folder? True



---------------------------------------------------------------------------

FileExistsError                           Traceback (most recent call last)

&lt;ipython-input-13-6d2da66512fd&gt; in &lt;module&gt;
      6 print(&quot;Is folder?&quot;,os.path.isdir(&quot;my_folder&quot;))
      7 os.chdir(&quot;my_folder&quot;)
----&gt; 8 os.rename(&quot;newfile.txt&quot;,&quot;hello.txt&quot;)
      9 print(&quot;New Contents of folder my_folder\n&quot;,os.listdir(&quot;my_folder&quot;))


FileExistsError: [WinError 183] Cannot create a file when that file already exists: &#39;newfile.txt&#39; -&gt; &#39;hello.txt&#39;</code></pre>
    <p>
      If you create a filename that already exists Python will give
      <code>FileExistsError</code> error. To delete a file use, you can use
      <code>os.remove(filename)</code>:
    </p>
    <pre><code>os.getcwd()
os.remove(&quot;hello.txt&quot;)</code></pre>
    <h3 id="importing-flat-files-using-numpy">
      Importing flat files using NumPy
    </h3>
    <p>
      <img
        src="https://res.cloudinary.com/dyd911kmh/image/upload/f_auto,q_auto:best/v1590365454/read7_pcg8th.png"
      />
    </p>
    <p>
      Numerical Python, or more commonly known as NumPy arrays, is the Python
      standard for storing numerical data. They are efficient, fast, and clean.
      They are widely used in linear algebra, statistics, machine learning, and
      deep learning. NumPy arrays act as a backbone for reading image datasets.
    </p>
    <p>
      It is also useful for packages like <code>Pandas</code> and
      <code>Scikit-learn</code>. NumPy consists of a lot of built-in functions
      which can be leveraged to do data analysis, manipulation: efficiently and
      in an easier fashion.
    </p>
    <h4 id="mnist-data">MNIST data</h4>
    <p>
      The sample MNIST <code>.csv</code> dataset can be downloaded from
      <a
        href="https://github.com/wblakecannon/DataCamp/blob/master/05-importing-data-in-python-(part-1)/_datasets/mnist_kaggle_some_rows.csv"
        >here</a
      >.
    </p>
    <p>
      You can find more information about the MNIST dataset from
      <a href="http://yann.lecun.com/exdb/mnist/">here</a> on the webpage of
      Yann LeCun.
    </p>
    <p>
      You will first import the NumPy module and then use the
      <code>loadtxt</code> method to import the MNIST data, as shown below:
    </p>
    <pre><code>import numpy as np
data = np.loadtxt(&#39;mnist.csv&#39;, delimiter=&#39;,&#39;)
print(data)


[[1. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [1. 0. 0. ... 0. 0. 0.]
 ...
 [2. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [5. 0. 0. ... 0. 0. 0.]]</code></pre>
    <p>
      If your dataset has a header with string values, you can use the
      <code>skiprows</code> parameter and skip the first row. Similarly, you can
      use the <code>usecols</code> parameter to read only some specific columns.
    </p>
    <p>
      You can also pass in the <code>dtype</code>, i.e., datatype in which you
      want to import your data either integer, float, string, etc.
    </p>
    <p>
      <strong>Note</strong> that NumPy arrays are capable of handling only one
      type of datatype, meaning it cannot have mixed data types in a single
      array.
    </p>
    <p>Let’s check the number of rows and columns this dataset has:</p>
    <pre><code>data.shape


(100, 785)</code></pre>
    <p>
      If you would like to learn more great ways to handle data in Python then
      check out
      <a
        href="https://www.datacamp.com/community/tutorials/pandas-tutorial-dataframe-python"
        >this tutorial</a
      >.
    </p>
    <h2 id="conclusion">Conclusion</h2>
    <p>Congratulations on finishing the tutorial.</p>
    <p>
      Now you know how to handle files in Python and their manipulation from
      creation to operating system level handling.
    </p>
    <p>
      You might want to try experimenting with various NumPy functionalities
      that could be leveraged to understand numerical and imagery datasets. You
      could further analyze the dataset graphically using the Matplotlib
      plotting library.
    </p>
    <p>
      If you want to learn more about importing files in Python, check out
      DataCamp’s
      <a href="https://www.datacamp.com/courses/importing-data-in-python-part-1"
        >Importing Data in Python</a
      >
      course.
    </p>
    <p>
      <a
        href="https://www.datacamp.com/community/tutorials/reading-writing-files-python"
        >Source</a
      >
    </p>
  </body>
</html>
